Investigating potential novel therapeutic targets and biomarkers for ankylosing spondylitis using plasma protein screening

Background Ankylosing spondylitis (AS) is a chronic inflammatory disease affecting the spine and sacroiliac joints. Recent genetic studies suggest certain plasma proteins may play a causal role in AS development. This study aims to identify and characterize these proteins using Mendelian randomization (MR) and colocalization analyses. Methods Plasma protein data were obtained from recent publications in Nature Genetics, integrating data from five previous GWAS datasets, including 738 cis-pQTLs for 734 plasma proteins. GWAS summary data for AS were sourced from IGAS and other European cohorts. MR analyses were conducted using “TwoSampleMR” to assess causal links between plasma protein levels and AS. Colocalization analysis was performed with the coloc R package to identify shared genetic variants. Sensitivity analyses and protein-protein interaction (PPI) network analyses were conducted to validate findings and explore therapeutic targets. We performed Phenome-wide association study (PheWAS) to examine the potential side effects of drug protein on AS treatment. Results After FDR correction, eight significant proteins were identified: IL7R, TYMP, IL12B, CCL8, TNFAIP6, IL18R1, IL23R, and ERAP1. Elevated levels of IL7R, IL12B, CCL8, IL18R1, IL23R, and ERAP1 increased AS risk, whereas elevated TYMP and TNFAIP6 levels decreased AS risk. Colocalization analysis indicated that IL23R, IL7R, and TYMP likely share causal variants with AS. PPI network analysis identified IL23R and IL7R as potential new therapeutic targets. Conclusions This study identified eight plasma proteins with significant associations with AS risk, suggesting IL23R, IL7R, and TYMP as promising therapeutic targets. Further research is needed to explore underlying mechanisms and potential for drug repurposing.


Introduction
Ankylosing spondylitis (AS) is a disabling chronic arthritis resulting from a combination of factors.Genetic susceptibility does not fully explain the etiology of AS, and the effect of the interplay of genes, sex, microorganisms, mechanical stress, and additional lifestyle and environmental factors on increased susceptibility to AS is unclear, which adds to the complexity of treatment (1).Targeted biologic therapies are the mainstay treatment for patients with AS who are difficult to treat with nonsteroidal anti-inflammatory drugs (2).In recent years, the successful introduction of monoclonal antibodies against AS (interleukin [IL]-17A and tumor necrosis factor-a [TNF-a]) has exemplified numerous vital pathological pathways.Nonetheless, <50% of patients respond favorably to IL-17A and TNF-a blockade.Currently, curative treatment for AS is lacking, and many patients have to manage symptoms through lifelong medication, which can induce adverse reactions (3).Consequently, exploring novel diagnostic biomarkers and therapeutic agents is crucial to provide more effective diagnostic methods and therapeutic options for patients with AS, thereby improving patient prognosis.
Human proteins are crucial in diverse biological processes and represent the main drug targets.Nelson et al. (4)revealed that protein drug targets linked to genetic association-supported diseases are likely to be approved for marketing, with a possible twofold increase.In recent years, Mendelian randomization (MR) analyses have been extensively applied to develop and repurpose drug targets (5).Genetic instrumental variable analysis involves the use of single nucleotide polymorphisms (SNPs) from genome-wide association studies (GWAS) as the genetic approach to estimate the causal relationship between the exposure and the outcome.In contrast to observational studies, MR avoids the effects of confounding factors.Progress in plasma high-throughput genomic and proteomic technologies has allowed MR-based strategies to facilitate the identification of candidate therapeutic targets for diverse disorders (e.g., type 1 diabetes and breast cancer) (6,7).However, MR analyses that integrate GWAS with protein quantitative trait locus (pQTL) data, an approach that could provide important insights for early disease diagnosis and drug target discovery, are lacking in AS research.Consequently, the present study aimed to integrate large-scale GWAS with pQTL data using MR analysis to investigate plasma proteins that could be candidate biomarkers or therapeutic targets of AS.
We established the precise control of AS by discovering potential drug targets from plasma proteins.Figure 1 shows the study design.First, MR analysis of two samples was performed, and eight potential drug proteins were screened after FDR correction.The plasma protein data were extracted from plasma proteomicsrelated publications, whereas the AS data were obtained from the GWAS data of the International Genetics of Ankylosing Spondylitis Consortium (IGAS) (8).Secondly, we performed colocalization analysis to verify the robustness of the genetic associations between plasma proteins and AS.Third, the relationship between the identified proteins was analyzed using the protein-protein interaction (PPI) network to identify potential therapeutic targets.Fourth, a sensitivity analysis, including Replication and metaanalysis, was conducted to ensure the accuracy and directionality of the identified associations.Finally, we assessed the potential adverse effects of identified drug proteins on other phenotypes using phenome-wide Mendelian randomization analysis.
Additionally, the F-statistic for each instrumental variable was determined using the formula: F = R2×(N − 2)/(1 − R2), where R2 is calculated as 2×EAF×(1 − EAF)×b2.This calculation helps to prevent biases due to weak instruments, and an F-statistic above 10 is deemed adequate to overcome such biases (15).SNPs with palindromic structures were systematically removed from the analysis.

Sources of outcome data
We obtained a GWAS summary dataset from the IGAS for the preliminary analysis of AS, which included 19,688 patients and 15,145 controls.GWAS summary data for Ankylosing spondylitis was also obtained from multiple independent cohort studies, all of European ancestry.No overlapping participating studies were shared between the GWASs for plasma protein levels and Ankylosing spondylitis.

MR analysis
Our analysis complies with the STROBE-MR guidelines, and we have included the comparative report as Supplementary Table S6.We performed Mendelian randomization studies to assess the causal links between plasma protein levels and Ankylosing spondylitis.In MR, genetic variants are used as proxies for risk factors.Therefore, the instrumental variables (IVs) chosen must meet three essential criteria to ensure valid causal inference: (1) the genetic variants must be directly linked to the exposure; (2) the genetic variants should not be linked to any confounders that could affect the relationship between the exposure and the outcome; (3) the genetic variants should affect the outcome only through the exposure and not via any other pathways.
Plasma proteins and AS were utilized as the exposure and outcome, respectively.MR analysis was conducted using "TwoSampleMR" (https://github.com/MRCIEU/TwoSampleMR).If there was only one pQTL for a given protein, the Wald ratio was used.If at least two genetic tools could be applied, inverse variance-weighted MR (IVW-MR) was used and subsequently analyzed for heterogeneity (16).In the preliminary analysis, we applied the false discovery rate (FDR) correction for multiple testing to select potentially effective causal proteins.A sensitivity analysis was performed to verify the accuracy of the data.For proteins to which at least two genetic tools could be applied, we performed heterogeneity analysis.Steiger filtering was also performed to verify the association direction of AS with the plasma proteins.

Colocalization analysis
Colocalization is an additional analysis that strengthens the results of genetic studies by seeking evidence of the same genetic variants being associated with both the exposure and the outcome.This helps to confirm that the results are due to a causal relationship with the genetic variant, rather than due to linkage disequilibrium (LD) or other confounding factors.We conducted colocalization analysis using the coloc R package (17).Colocalization analysis provides five hypotheses: H0, the genetic variant is not associated with either trait; H1, associated only with one trait; H2, associated only with the other trait; H3, associated with both traits but with different causal variants; and H4, associated with both traits and with the same causal variant.We focused on proteins with a combined posterior probability of association PPH4 of 0.90 or greater (18).

Protein-protein and protein-drug association analysis
We utilized a protein-protein interaction (PPI) network to assess relevant plasma protein targets that were significantly related to AS susceptibility.We constructed a functional protein interaction network (https://cn.string-db.org)from the STRING database.Typically, the minimal interaction score required for STRING was 0.4.Identifying protein targeting pathways is important for the discovery of efficient drug compounds that can change target or downstream protein activity to terminate disease development.Consequently, the dgidb database (https://dgidb.org/)was utilized to explore the relationships of AS protein targets with corresponding genes.

Phenome-wide MR
We used summary statistics of diseases from the UK Biobank cohort to perform phenome-wide Mendelian randomization analysis to investigate the potential side effects of these five candidate drug genes.To ensure the accuracy and scalability of the analysis, the UK Biobank disease GWAS employed the generalized mixed model (SAIGE V.0.29) method to address the issue of unbalanced casecontrol ratios (19).Based on statistical power considerations, we selected 783 traits (diseases) with at least 500 cases for phenotypic MR analysis.Subsequently, we conducted MR analysis using the IVW or Wald ratio method with the same parameters.If the FDR-corrected p-value was less than 0.05, the causal effect was considered statistically significant.The summary statistics for disease-associated SNPs were obtained from SAIGE GWAS (available for download at https:// www.leelabsg.org/resources)(19).

Replication and meta-analysis
We repeated the MR analysis in another AS cohort, which included 1,462 cases of European ancestry and 164,682 controls of E u r o p e a n a n c e s tr y , i d e n ti f y i n g 1 6 , 3 8 0 , 0 2 2 S N P s t o comprehensively evaluate the robustness of the candidate proteins identified by the above criteria.The GWAS data used for replication analysis of AS was obtained from the Finnish database and included in the IEU GWAS repository (https://gwas.mrcieu.ac.uk/datasets/ finn-b-M13_ANKYLOSPON/) with the GWAS ID finn-b-M13_ANKYLOSPON.The criteria for replication analysis were that the AS SNP had the same direction of effect and reached p<0.05 in the meta-analysis of the combined results of the two replication GWAS.The meta-analysis was conducted using the R package "meta" (version 7.0-0).

Colocalization analysis of the eight significant proteins
We conducted colocalization analyses for the eight candidate proteins to further determine the likelihood of shared causal genetic variants associated with AS and pQTL.The results indicate that IL23R, IL7R, and TYMP are likely to share causal variants in this region (PPH4 > 0.90), making them the strongest candidate proteins for AS (Figure 3, Supplementary Table S3).On the other hand, ERAP1, IL18R1, CCL8, TNFAIP6, and IL12B are less likely to share causal variants with AS in this region (PPH4 < 0.90).The colocalization and genes track plots for these five proteins are shown in Supplementary Figure S1.Notably, although the PPH4 values for ERAP1 and IL18R1 are less than 0.9, their PPH3 values are close to 1. Thus, we believe that these two genes are associated with protein levels in this region and AS, but the evidence supports separate causal variants.

Relationships between candidate drug targets and AS
The PPI networks illustrate the interactions of three prioritized proteins (IL23R, IL7R,and TYMP) with two current AS drug targets (TNF-a and IL17), as shown in Supplementary Figure S2.Specifically, IL23R and IL7R are associated with TNF-a, which is targeted by infliximab and adalimumab.Additionally, the PPI  S3).This indicates the potential relevance of IL23R and IL7R as new therapeutic targets for AS, supported by their interactions with existing drug targets.We searched the DGIdb database (https://dgidb.org/)for current drugs targeting potential disease-causing proteins.Three potential disease-modifying drugs were identified: tipiracil hydrochloride targeting Thymidine Phosphorylase (TYMP), ruxolitinib targeting Interleukin 7 Receptor (IL7R), and celecoxib targeting Interleukin 23 Receptor (IL23R).All these medications are approved, highlighting their potential for repurposing in AS treatment (Supplementary Table S5).

Phenome-wide MR analysis of candidate drug-target proteins
To assess the potential beneficial or harmful effects of these three AS-related candidate proteins on other phenotypes, we conducted a phenome-wide association study.A comprehensive MR screening of 783 diseases or traits was performed using data from the UK Biobank.Overall, we identified 86 phenotypes that may have a causal relationship with the candidate proteins (P < 0.05), as shown in Supplementary Table S4.After FDR correction, there was almost no statistical evidence of adverse side effects for these candidate drug proteins, suggesting that their development appears to be safe (Figure 4).

Replication and meta-analysis
In the replication analysis, we used the original protein SNPs (Supplementary Table S1) as exposure instruments and AS data from the Finnish database as the outcome for MR analysis.The main analysis methods remain the Wald ratio or IVW.Additionally, we conducted a meta-analysis combining results from the replication cohort and the original results.Our criteria for replication analysis are: the direction of effect of the targeted protein on disease risk must be consistent (i.e., the sign of beta in both MR analyses should be the same), and the meta-analysis of the two replication GWAS results should reach p<0.05.In the replication cohort, we found that the MR results for IL23R were still significant (OR=1.81,95% CI (1.19, 2.76), P=0.005), but it should be noted that IL7R and TYMP did not show statistically significant evidence in the replication cohort.The meta-analysis results indicate that although the combined p-values for IL7R and IL23R were significantly attenuated when discovery and replication results were combined, the MR results for the three candidate proteins remained significant.Specifically, IL23R (OR=1.42,95% CI: 1.01-1.99,P-meta=0.044),IL7R (OR=1.04,95% CI: 1.02-1.06,P-meta=0.00044),and TYMP (OR=0.91,95% CI: 0.86-0.95,P-meta=8.52e-5).Additionally, the replication results for these three target proteins were consistent with the findings regarding disease risk.The detailed results are shown in Figure 5.

Comparison with the study by Zhao et al.
The proteomics data used by Zhao et al. (20)came from the UK Biobank-PPP database, and the AS genetic association study data came from the R9 version of the Finnish database.Their study did not include a replication cohort.Given that the protein pQTL data and outcomes used in Zhao et al.'s study are different from those used in our study, we conducted a comparative analysis of their results.Figure 6 illustrates a comparison of protein MR effect estimates between our study and Zhao et al.'s study, categorizing proteins into three groups: proteins identified in both studies (Both), proteins identified only in our study (Our Only), and proteins identified only in Zhao et al.'s study (Zhao Only).We highlighted the p-values of the candidate proteins (IL23R, TYMP, and IL7R) in both studies.The figure clearly shows that TYMP and IL7R have more significant p-values in our study compared to Zhao et al.'s study.In their study, the MR effect of TYMP was available but not significant in the UK Biobank, which explains the absence of these two proteins in Zhao et al.'s results.As for IL23R, its absence in Zhao et al.'s study is due to the lack of measurement of this protein in the UK Biobank.Therefore, our research serves as an extension and enhancement of Zhao et al.'s findings.

Discussion
The development of new therapeutic agents for ankylosing spondylitis is challenging.One of the main reasons for this difficulty is the incomplete understanding of the pathophysiology of AS.The human proteome is the primary therapeutic target, and protein drug targets possess significant clinical value.Therefore, an integrative analysis was conducted to identify novel anti-AS therapeutic targets to evaluate the causative proteins of AS based on prior GWAS (21).
The "causality" detected through MR could be genetic confounding, or horizontal pleiotropy because of LD (9).Thus, only cis-pQTL were used as instruments to limit horizontal pleiotropy bias because they directly the translation and transcription of the gene of interest.In this study, we analyzed the potential pathogenic relationship between plasma proteins and the risk of AS and identified potential therapeutic targets among plasma proteins for AS.After adjusting for FDR, the plasma proteins ERAP1, IL12B, IL18R1, IL23R, IL7R, CCL8, TNFAIP6, and TYMP showed a causal relationship with the risk of AS.The Steiger filtering method were used to validate the directionality of the causal relationships (22).These methods showed no evidence of reverse causation for the proteins identified in the primary analysis.This reinforces the finding that IL23R, IL7R, and TYMP are likely contributors to the pathogenesis of AS, suggesting that these proteins play a causative role in the disease's development rather than being a consequence of it.The PPI networks revealed that IL23R and IL7R interact with current AS drug targets TNF-a and IL17, indicating their potential as new therapeutic targets.Additionally, database searches identified three approved drugstipiracil hydrochloride, ruxolitinib, and celecoxib-targeting TYMP, IL7R, and IL23R, respectively, suggesting their potential for repurposing in AS treatment (23).In the replication analysis section, we attempted to replicate the study results using FinnGen AS GWAS as the outcome.We found that the MR result for IL23R remained significant in the replication analysis.As for IL23R and TYMP, although the direction of the MR results was consistent with the discovery results, they did not reach significance in the replication cohort.This may be due to the smaller sample size of the replication cohort, which reduces statistical power.In addition, our MR PheWAS results indicate that developing these potential drug proteins appears to be safe, as there is little statistical evidence to suggest that they will produce harmful side effects (24).
IL23R is implicated in the IL-23/IL-17 axis, which plays a crucial role in the pathogenesis of AS.Studies have shown that IL-23 stimulates the expansion and maintenance of Th17 cells, leading to increased levels of IL-17, a proinflammatory cytokine involved in AS (25,26).Genetic variants in IL23R have been associated with AS susceptibility, highlighting it as a potential therapeutic target.IL7R is essential for T-cell development and   studies, with ERAP1 suggested to be the second most potent gene related to (28,29).This further establishes the reliability of our study of ERAP 1 as a drug target for treatment of AS.ERAP1 and HLA-B27 contribute to 70% of familial genetic factors for AS (30).ERAP1 polymorphisms have been shown to affect AS susceptibility in HLA-B27+ people (31).Thus, ERAP1 can exert synergistic effects on factors such as HLA-B27 and is related to abnormal peptide processing and incorrect antigen presentation, leading to AS susceptibility.According to one case−control association study, protective genetic variants are related to a decrease in ERAP1 and ERAP2 function and inhibition of cell surface major histocompatibility complex I expression (8).ERAP1 and ERAP2 variants can affect associated peptide numbers to decrease the accelerated HLA-B27 folding rate, thus aggravating ER stress while accelerating AS progression (32).Mei et al. (33) reported that AS patients had higher serum IL-17 and IL-23 levels than did normal participants.IL-23, an IL-12-associated cytokine, may promote T helper (Th)17 cell growth and differentiation (34).Th17 cells are involved in the pathogenesis of AS (35) and are also implicated in psoriasis, inflammatory arthritis, and Crohn's disease (36)(37)(38).IL-12B is likely related to the pathogenesis of AS because of its effects on IL-23R+CD4+ T cells and preferential stimulation of the above cells to release IL-17 predominantly (39).
The present systematic study based on MR studies precisely addresses the inability of previous animal experiments to infer causality.Moreover, we identified novel AS-associated proteins (TYMP, CCL8, and TNFAIP6).TNFAIP6, also known as TSG 6, is a class of proteins induced by TNF-a that is secreted during acute inflammatory responses and is associated with inflammation and tissue remodeling (40).Previous studies have reported that TNFAIP6 has anti-inflammatory effects in an experimental mouse model of arthritis (41), but further evidence to support this observation is lacking.Moreover, the present study suggested that TNFAIP6 may have similar anti-inflammatory protective effects on AS via MR, providing a strong reference for the TNFAIP family as the causative genes for AS.CCL8 is a monocyte chemokine that can interact with CCR1, CCR2B, and CCR3; it also regulates tumor occurrence, antiviral infections, and inflammatory immunity in the host (42).Consistent with the present findings, one study suggested that CCL8 is upregulated in the serum of patients with AS and may be a useful biomarker for predicting active inflammation in the sacroiliac joints of patients with AS (43).TYMP catalyzes reversible thymidine phosphorylation and is suggested to have a critical effect on angiogenesis, tumor growth, migration, and invasion (44).A recent study showed that TNF-a strongly stimulates TYMP expression in fibroblast-like synoviocytes.Thus, we hypothesize that TNF-a may affect AS by inducing TYMP expression.study indicated potential significance (P=0.0037),suggesting a potential causal role of this protein in the risk AS.However, after p-value correction, TYMP was not selected as a candidate drug protein in their study.Furthermore, our discovery of drug-target candidate proteins (IL23R, TYMP, and IL7R) passed rigorous screening, with co-localization analysis strongly supporting their shared causal variants with AS in the region (PPH4 > 0.90).Moreover, our MR-PheWAS results showed almost no statistical evidence of adverse side effects for these candidate drug proteins, indicating that their development seems to be safe.Overall, these findings demonstrate the feasibility and practicality of our candidate proteins in drug development.We believe that the screening of these targeted proteins and the analysis of adverse effects can provide valuable references for research teams developing new targeted drugs for AS.
However, this study has certain limitations.First, since the instrumental variables included in this study mainly consist of a single cis-acting SNP and lack trans-pQTLs, it is not possible to determine the sensitivity test results at this level, which affects pleiotropy and heterogeneity analyses.Nevertheless, the F-statistics for our selected SNPs were all greater than 10, indicating minimal weak instrument bias.Second, the coloc method assumes single causal variants, which may not be accurate, and we are unable to assess AS progression genes (as GWAS identifies AS susceptibility genes).Therefore, our focus is on finding targets for disease prevention rather than treatment.Third, this study's exposure and outcome data came from groups with European heritage.To translate these findings into practical applications, more study on non-European ancestries is needed in order to apply the conclusions to other locations, including Asia, Africa, and the Americas.Fourth, the results of the PPI study are suggestive rather than definitive, despite the fact that we were able to identify certain connections between pathogenic proteins and therapeutic targets of existing MS medications.It is need to do further research involving people who are healthy and patients with AS to confirm these correlations.

Conclusions
Our study identified eight plasma proteins significantly associated with ankylosing spondylitis risk using Mendelian randomization and colocalization analyses.We provided robust evidence for the causal roles of IL23R, IL7R, and TYMP, highlighting them as promising therapeutic targets.Phenomewide MR analysis also evaluated potential side effects of these targets.Despite limitations, our findings enhance understanding of the genetic architecture of AS and support future research to validate these targets for clinical application.
corresponding gene track plots for these proteins, illustrating the -log10(Pvalue) along the chromosomal position, indicating genetic variants associated with AS and candidate proteins.The gene locations and structures are shown below the association signals.

FIGURE 1 A
FIGURE 1A flowchart for Mendelian randomization (MR) identification of pathogenic plasma proteins in ankylosing spondylitis (AS).

FIGURE 2
FIGURE 2Volcano plot of MR analysis of 734 plasma proteins for AS risk.OR, odds ratio, per standard deviation increase in plasma protein levels.Dashed horizontal line represented P-fdr =0.05;PVE, proportion of variance explained.

FIGURE 3
FIGURE 3 Colocalization and Gene Track Plots for IL7R, IL23R, and TYMP.(A, C, E) display the colocalization plots for IL7R, IL23R, and TYMP.(B, D, F) present the corresponding gene track plots for these proteins, illustrating the -log10(P-value) along the chromosomal position.The gene locations and structures are shown below the association signals.

FIGURE 4
FIGURE 4Phenome-wide Association Study (PheWAS) of IL17R, IL23R, and TYMP.This scatter plot displays the -log10 (P,adjust_fdr) for the associations between IL17R, IL23R, and TYMP gene levels and various phenotypes across different categories.Each point represents a specific phenotype within a category, with colors indicating the corresponding protein.The horizontal line represents the significance threshold.

FIGURE 5
FIGURE 5Replication and Meta-Analysis of Three Candidate Proteins.Meta-analysis of IL7R (A), IL23R (B), and TYMP (C) using data from FinnGen cohorts.
Previously, Zhao et al. (20)studied drug targets for ankylosing spondylitis using the UK Biobank-PPP database.Given that the pQTL data used by Zhao et al. differs from the data source in our study, we conducted a comparative analysis of their findings.The results indicate that the target proteins we identified are entirely different from theirs.For instance, the absence of IL23R protein measurement in the UK Biobank led to its omission in Zhao et al.'s study, thereby extending the data coverage scope in our research.Additionally, the MR analysis results for TYMP in Zhao et al.'s

FIGURE 6
FIGURE 6 Comparison of p-values between our study and Zhao et al.'s study.The plot categorizes proteins into three groups: identified in both studies (Both), only in our study (Our Only), and only in Zhao et al.'s study (Zhao Only).Highlighted are the p-values of candidate proteins IL23R, TYMP, and IL7R.

TABLE 1
Mendelian randomization analysis of plasma protein and ankylosing spondylitis after FDR correction.
OR, odds ratio; per standard deviation increase in plasma protein levels.PVE, proportion of variance explained; IVW, inverse variance weighted.network shows that IL23R and IL7R are associated with IL17, a target of ixekizumab (Supplementary Figure